1 Annals of 

human genetics 



doi: 10.1 1 1 1/j. 1469- 1809. 2009. 00556.x 

Inferring Continental Ancestry of Argentineans 

from Autosomal, Y-Chromosomal and Mitochondrial DNA 

Daniel Corach 1 *, Oscar Lao 2 , Cecilia Bobillo 1 , Kristiaan van Der Gaag 3 , Sofia Zuniga 3 , 

Mark Vermeulen 2 , Kate van Duijn 2 , Miriam Goedbloed 2 , Peter M. Vallone 4 , Walther Parson 5 , 

Peter de Knijff 3 and Manfred Kayser 2 

1 Servicio de Huellas Digitales Geneticas and Catedra de Genetica y Biologla Molecular, Faculty of Pharmacy and Biochemistry, 

University of Buenos Aires, Argentina 

2 Department of Forensic Molecular Biology, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands 

3 Department of Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands 

4 Biochemical Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA 
1 1nstitute of Legal Medicine, Innsbruck Medical University, Austria 



Summary 

We investigated the bio-geographic ancestry of Argentineans, and quantified their genetic admixture, analyzing 246 
unrelated male individuals from eight provinces of three Argentinean regions using ancestry-sensitive DNA markers 
(ASDM) from autosomal, Y and mitochondrial chromosomes. Our results demonstrate that European, Native American 
and African ancestry components were detectable in the contemporary Argentineans, the amounts depending on the 
genetic system applied, exhibiting large inter-individual heterogeneity. Argentineans carried a large fraction of European 
genetic heritage in their Y-chromosomal (94.1%) and autosomal (78.5%) DNA, but their mitochondrial gene pool is 
mostly of Native American ancestry (53.7%); instead, African heritage was small in all three genetic systems (<4%). 
Population substructure in Argentina considering the eight sampled provinces was very small based on autosomal (0.92% 
of total variation was between provincial groups, p = 0.005) and mtDNA (1.77%, p = 0.005) data (none with NRY data), 
and all three genetic systems revealed no substructure when clustering the provinces into the three geographic regions 
to which they belong. The complex genetic ancestry picture detected in Argentineans underscores the need to apply 
ASDM from all three genetic systems to infer geographic origins and genetic admixture. This applies to all worldwide 
areas where people with different continental ancestry live geographically close together. 
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Introduction 

Establishing reliable genetic knowledge about bio-geographic 
ancestry, the degree of admixture and the extent of population 
substructure is of relevance mostly in the field of epidemio- 
logical studies, but can also be useful in the forensic context 
and is additionally interesting from a historical point of view. 
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Argentineans are usually considered as a population of strict 
European ancestry; however, historical evidence suggests that 
this may not be the case. For instance, a census in 1869 showed 
that the population of Argentina involving 1,756,000 people 
was composed of “criollos”: individuals of European descent 
born outside the original European countries, “mestizos”: in- 
dividuals of mixed Native American and European descent, 
“zambos”: individuals of mixed Native American and African 
descent, and “mulatos”: individuals of mixed European and 
African descent, in addition to people of assumed unmixed 
African and Native American ancestry. It can be assumed 
that three major admixture episodes happened during Argen- 
tinean population history. The first involved Native Ameri- 
cans and Western Europeans (mostly Spaniards) and started 
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soon after the first arrival of the Spanish conquistadores in 
the 16 th century. The second admixture episode additionally 
included West-Africans, and began after African slaves were 
first introduced to the territory in the late 16 th , century with 
constant influx until 1810. Finally, a third major admixture 
period involved two sides: the already admixed Argentinean 
population consisting of non-mixed or already mixed Native 
American, West-European (mostly Spanish) and West-African 
individuals on one hand, and on the other a large number of 
Europeans who entered Argentina between 1856 and 1930. 
These late European immigrants mostly came from Italy and 
Spain (over 76%), followed by French, Polish, Russians, and 
Germans (approx. 13%), representing a major migration wave 
of over 5,700,000 immigrants entering the country, of which 
3,000,000 settled down in Argentina. Notably, in 1914 more 
than one third of Argentineans were born outside the country 
(Martinez Sarasola, 2005). Much more recently, people from 
various countries in Africa, Asia, and Europe as well as peo- 
ple of various ancestries from neighbouring South American 
countries entered Argentina, adding to the human diversity 
of the country. 

Although some expectations about the continental back- 
ground of Argentineans can be formulated based on census 
information, such information does not provide quantitative 
estimates on the degree of admixture in the contemporary Ar- 
gentinean population. Molecular genetics offers suitable tools 
to investigate bio-geographic ancestry in detail including the 
detection and quantification of admixture proportions. Some 
information has been published about the genetic make-up of 
Argentineans either employing markers from autosomal DNA 
(Sala et al., 1998, 1999; Marino et al., 2006a, b,c; Seldin et al., 
2007), from the non-recombining part of the ^chromosome 
(NRY) (Kayser et al., 1997; Corach et al., 2001; Kayser et al., 
2001; Marino et al., 2007), or, to a lesser degree, also from 
mitochondrial DNA (mtDNA) (Ginther et al., 1993; Corach 
et al., 1997; Bobillo et al., 2009), but reliable inferences on 
bio-geographic ancestry are limited. Also, the combined anal- 
ysis of uni-parentally and bi-parentally inherited markers in 
the same individuals has rarely been done (Martinez Marignac 
et al., 2004; Corach et al., 2006; Salas et al., 2008). There- 
fore we analyzed 249 unrelated males from eight provinces of 
three Argentinean regions by means of DNA markers from 
the autosomal, Y-chromosomal and mitochondrial parts of 
the human genome suitable for detecting continental ori- 
gins. In addition to genetic markers, we also used paternal 
surnames as culturally-transmitted markers, to further extend 
our bio-geographic ancestry analyses in the Argentinean pop- 
ulation, similar to one previous study on Columbians (Bedoya 
et al., 2006). It should be noted that in Argentina, in contrast 
to most South American countries, only paternal, and not 
maternal, family names are used. 



Materials and Methods 

Samples 

Samples were collected at the Servicio de Huellas Digitales 
Geneticas (DNA Fingerprinting Service), School of Pharmacy 
and Biochemistry, University of Buenos Aires, Argentina. Sam- 
pling took place during the period 2005—2007 and included 
unrelated male volunteer donors, who participated in pater- 
nity testing and signed written consent statement forms, ap- 
proved by the local Ethical Committee. Personal information 
was treated anonymously. Blood samples were obtained by finger 
puncture and spotted onto FTA paper. DNA extraction was per- 
formed following the manufacturer’s protocol (FTA, Whatman, 
www.whatman.com/DNACollection.aspx). Initially, 249 sam- 
ples were ascertained, however one individual with a Japanese 
surname and two others with Middle Eastern surnames were ex- 
cluded, leaving 246 samples from individuals with European and 
Native American surnames in the study. Sample came from eight 
provinces from three geographical regions of the country (Fig. 1): 
Formosa (AFO, N = 11), Chaco (ACA, N = 1), Misiones (AMI, 
N = 28) and Corrientes (ACO, N = 21) from the north-eastern 
Argentinean region (N = 61); Santa Fe (ASF, N = 3) and Buenos 
Aires (ABS, N = 150) from the central Argentinean region (N 
= 153), as well as Rio Negro (ARN, N = 31) and Chubut 
(ACH, N = 1) from the southern Argentinean region (N = 32). 
Sample size was chosen as an approximation to match the relative 
contribution of each region to the entire Argentinean popula- 
tion (National Institute of Statistics and Censuses, INDEC 2001 
www.indec.mecon.ar) to achieve an approximate representation 
of the Argentinean population. Although this resulted in some 
of the provinces having too small sample sizes, all samples were 
used, especially in the framework of the regional approach and 
when considering Argentina as a whole population. At the time 
of sampling the most likely geographic origin of the surname of 
all the participants was recorded. However, for the purpose of this 
study no record of the surname itself but only its geographic ori- 
gin was used to assure anonymous treatment. In particular, indi- 
vidual surnames were inspected for likely European (comprising 
Belgium, Croatia, France, Germany, Greece, Italy, Netherlands, 
Poland, Portugal, Spain, Russia, United Kingdom and Ukraine), 
African, Asian, Middle East and Native American origin. Ge- 
ographical origin of the non-Native American surnames was 
assessed according to Hanks & Hodges (1989). Surnames of Na- 
tive American origin were identified using linguistic knowledge, 
mainly by detecting linguistic elements within the surnames of 
known Amerindian origins such as Mupudungun, Diaguita, etc. 
In addition, we ascertained from the Human Genome Diversity 
Project —Centre d'Etudes d'Polumorphismes Humains (HGDP- 
CEPH: http://www.cephb.fr/en/hgdp/diversity.php/) samples 
of those individuals that represent the closest geographic relatives 
of the most likely true parental populations for Argentineans (Na- 
tive South Americans, diverse Europeans, and West Sub-Saharan 
Africans) given existing knowledge of the Argentinean history 
(Rock, 1987). These were in particular 29 French, 24 Basques, 
47 Italians comprising 1 1 from Bergamo, 28 from Sardinia, and 8 
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Figure 1 Map of Argentina indicating the provinces and regions sampled. 



from Tuscany, 1 6 Orcadian Islanders from Great Britain, 25 Rus- 
sians and 17 from Adygei in the Caucasus region, (i.e., all 158 
Europeans included in HGDP). Also sampled were 23 Karitiana 
and 21 Surui from Brazil, (i.e., all 44 Native South-Americans 
included in HGDP) as well as 24 Mandenka from Senegal and 
25 Yoruba from Nigeria (i.e., all 49 West Sub-Saharan Africans 
included in HGDP) . 

Genotyping 

Autosomal DNA 

Twenty-four autosomal SNPs were ascertained from a pool of 
62 pre-selected markers by applying a genetic algorithm for 
maximising the amount of non-redundant continental ances- 
try information per single marker as described in Lao et al. 
(2006). Fifty six of these markers had been previously ascertained 
from a dataset of >10,000 SNPs generated using the Affymetrix 
GeneChip® Human Mapping 10K Array Xbal31 (Mapping 
10K array) in the Y Chromosome Consortium (YCC) cell line 
panel. They comprised the most promising continental ancestry 
markers ascertained from the YCC dataset when applied to the 
HGDP-CEPH samples as described elsewhere (Lao et al., 2006; 
Kersbergen et al., 2009). The remaining six markers were ascer- 
tained from pigmentation candidate genes and had shown a strong 
continental population differentiation in the HGDP-CEPH sam- 
ples as described in Lao et al. (2007). SNP ascertainment was 
focused to enrich for four continental ancestry components: 
Sub-Saharan Africa, East-Asia, Eurasia, and Native America. 
The following SNPs were used here: rsl876482, rs2179967, 
rsl048610, rsl371048, rsl478785, rsl369290, rs952718, 
rs 1405467, rsl344870, rsl391681, rsl461227, rsl907702, 



rs2052760, rs714857, rs721352, rs722869, rs926774, rsl448484, 
rsl667751, rsl858465, rsl465648, rsl6891982, rsl808089, 
rs3843776. Genotyping was performed in two multiplex SNaP- 
shot reactions based on the principle of primer extension as will 
be described in detail elsewhere (Vallone et al. in preparation). 
SNP genotypes were scored by two independent analysts and 
finally reviewed by a third one. 

Mitochondrial DNA 

The entire control region (CR) of human mtDNA from 
nucleotide positions 16024 to 576 was sequenced following 
EMPOP recommendations as described in detail elsewhere 
(Brandstatter et al., 2007; updated in Parson & Bandelt, 2007). 
The sequences were aligned to the revised Cambridge Refer- 
ence Sequence (rCRS; Andrews et al., 1999) using Sequencher 
vs. 4.8 (GeneCodes, Ann Arbor, MI, USA), following updated 
nomenclature guidelines (Bandelt & Parson, 2008). All samples 
were evaluated twice by two independent analysts and results 
were compared using in-house software and finally reviewed 
by a third analyst. Furthermore, particular coding-region SNPs 
were analysed using a modified version (Amry et al. in prepara- 
tion) of a previously published assay (Alvarez-Iglesias et al. , 2007) 
and by direct sequence analysis to detail the haplogroup affilia- 
tions in cases where the CR sequences did not provide sufficient 
information for reliable haplogroup designation. In particular, 
samples belonging to Asian and Native American lineages were 
SNP analysed at positions 12468 (hg A2c); 6755 (hg B2b); 1888 
(hg Clc); 7697 (hg Cld); 8383, 8419, 9431 (hg D4c2), and 
5319 (hg D4c2a), whereas samples belonging to West Eurasian 
lineages were analysed at positions 7028, 2706 (hg H); 9716 (hg 
K2); 14798 (hg Jlc); 4580 (hg V); and 12308, 12372 (hg U). 
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Y-chromosome DNA 

Variation of the non-recombining part of the human Y- 
chromosome (NRY) was identified by means of 44 NRY-SNPs 
in total. Twenty four NRY-SNPs were genotyped in all samples 
(including: SRY 1532, M91, M168, M145, M174, 12f2, M96, 
M2 13, M20 1 , M69, M52, M170, M172, M9, M2Q, M106, 
M2 14, Tat, M175, M45, MEH2, M207, M269, and M124). 
Aiming to maximise continental differentiation of haplogroup 
origins we additionally genotyped 20 NRY-SNPs on subsets of 
samples based on the results from the 24 SNP analyses. M3 
was genotyped on samples with the derived allele of MEH2, 
M242 among samples of haplogroup P(xQla, R), and eigh- 
teen additional SNPs among samples identified as belonging to 
haplogroup E (M33, P2, M2, M154, M191, M215, M35, M78, 
V12, M224, V32, V13, V22, M81, M123, M281, V6, and M75). 
A single multiplex SNaPshot assay using the principle of primer 
extension was designed for a core set of 24 NRY-SNPs. Primer 
sequences for M45, M52, M170, M172, M173, M175, M213, 
1212 and SRY 1532 were taken from the literature (Sanchez 
et al., 2003). For the remaining NRY markers, reference se- 
quences for each locus were obtained from the BLAST human 
genome database (http://www.ncbi.nlm.nih.gov/blast/) and 
PCR-primers were designed for fragments ranging from 70 to 
225 bp in length using Primer 3 v.0.2 (http://frodo.wi.mit.edu) 
with default settings. Lengths of designed primers ranged from 
19 to 27 nucleotides, primers with five or more bases at the 3' 
end complementary to part of another primer in the multiplex 
were discarded or redesigned to avoid primer-dimer formation. 
Amplicon sequences were checked with BLAST for sequence 
homology in the human genome (all primer information can 
be found in Supporting Table S3). Extension primers were de- 
signed using Assay Design Software Version 1.0.6 (Biotage, Up- 
psala, Sweden). Primers with four or more bases at the 3' end 
complementary to part of another primer in the multiplex were 
discarded or redesigned to avoid non-specific primer-extension. 
To achieve different fragment length differences the multiplex 
primer lengths were altered by adding a piece of a “neutral” se- 
quence or a poly-C tail as described by Sanchez et al. (2003). 
Each primer pair was first validated in a singleplex PCR contain- 
ing 0.5 ng template DNA from a selection of samples (including 
a female control), 1 x PCR buffer containing 1.5 mM MgCl 2 
(Applied Biosystems, Foster City, CA, USA), 100 fxM of each 
dNTP (GE, the Netherlands)) 0.4 [X M of each desalted primer 
(Biolegio Nijmegen, the Netherlands) and 0.6 units of Ampli- 
Taq Gold® DNA polymerase (Applied Biosystems). In the final 
multiplex PCR, 0.5 ng template DNA was amplified in a 12.5 
fx\ reaction volume containing 1 x PCR buffer, 6.5 mM to- 
tal MgCE, 200 (iM of each dNTP and 2.5 units of AmpliTaq 
Gold® DNA polymerase. During multiplex validation primer 
concentrations were adjusted (0.1— 0.4 /iM) to achieve optimally 
balanced signal intensity for all markers. All initial PCRs were 
performed in a GeneAmp 9700 thermal cycler (Applied Biosys- 
tems) with an initial denaturation at 94° C for 10 min followed 
by 35 cycles of 30 s at 94°C, 30 s at 60°C, 30 s at 72°C and a 
final extension for 5 min at 72°C. To eliminate excess primers 
and dNTPs, 2 fx 1 ExoSAP-IT® (USB, Affymetrix, Cleveland, 



USA) was added and incubated at 37° C for 30 min, followed 
by a final enzyme inactivation at 80° C for 15 min. Extension 
reactions were performed in a 5 fxl reaction volume using 1 fx 1 
purified PCR product, 2.5 [X 1 of SNaPshot multiplex Ready 
Reaction Mix (Applied Biosystems) and 0.4 /iM primer (HPLC 
or PAAGE purified). During multiplex validation primer con- 
centrations were optimized (0.06—0.5 fx M) for balanced signal 
intensity. All reactions were performed using a GeneAmp 9700 
thermal cycler with a initial denaturation at 96°C for 2 min, 
followed by 25 cycles of 10 s at 96° C, 5 s at 50° C and 30 s 
at 60°C. To eliminate unincorporated ddNTPs 1.25 /xl SAP® 
-reagent (USB) was added incubated at 37°C for 1 hour. SAP 
was inactivated by incubation at 75°C for 15 min. 2 / xl of the 
SAP-treated extension product was analysed with an ABI3100 
Genetic Analyzer using a 36 cm capillary array, polymer POP4 
and Genescan 120 LIZ as internal size standard. Data were an- 
alyzed using GeneMapper ID v3.2.1 software (Applied Biosys- 
tems). After background subtraction and colour separation, peaks 
were sorted into bins according to sizes by comparison to the in- 
ternal size standard. An Excel-sheet was used to transfer exported 
allele tables and for automatic haplogroup assignments. 

Statistical analyses 

NRY haplogroups were derived from genotyping of Y-SNPs 
using the marker phylogeny as described elsewhere (Karafet 
et al., 2008). Mitochondrial DNA haplogroups were inferred 
from sequence data of the complete CR with the additional 
information of coding SNPs if needed (see above). The geo- 
graphic origin of the haplogroups was assumed from published 
NRY (Bortolini et al., 2003: Jobling & Tyler-Smith, 2003; Luis 
et al., 2004; Semino et al., 2004; Cruciani et al., 2007) and 
mtDNA data (Richards et al., 1998; Macaulay et al., 1999; Finnila 
et al., 2001; Kivisild et al., 2006; Kong et al., 2006; Achilli et al., 
2008; Behar et al., 2008) and the Argentinean samples were 
grouped accordingly. STRUCTURE (Pritchard et al., 2000) 
was performed by doing 50,000 burnings and retaining the 
next 50,000 Monte Carlo-Markov Chain runs for final analyses. 
Three parental populations were assumed and frequencies were 
updated according to the observed frequencies in these three 
populations. Ten different replicates were performed and con- 
vergence, mixing and reproducibility of the different runs were 
checked. Multi dimensional scaling (MDS) plots were obtained 
using SPSS 15.0 (SPSS for Windows, Rel. 15.0.1. 2006. SPSS 
Inc., Chicago, Illinois, USA). Distract 1.1 (Rosenberg, 2004) 
was used to tune the output from STRUCTURE and Grapher 7 
(http://www.goldensoftware.com) was used to perform a ternary 
plot of the most likely amount of ancestry of each parental pop- 
ulation per individual. Additional analyses were carried out with 
the most likely proportion of Native American ancestry esti- 
mated by STRUCTURE. Similarities in the amount of Native 
American ancestry between clusters of individuals (regional as- 
signment, geographic origin of the surname of each individual) 
were tested by means of Kruskal- Wallis and Mann- Whitney tests 
computed with SPSS 15.0. Similarities in the proportion of ge- 
ographic ancestry for mtDNA and NRY were tested by means 
of Fisher exact test as implemented in SPSS 15.0. In order to 
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quantify how much of the genetic variation of the autosomal 
markers was explained under particular individual assignments, 
an AM OVA analysis (Excoffier et al., 1992) was computed. Two 
different individual clustering scenarios were considered: i) re- 
gional assignment and ii) surname assignment. 



Results 

Genetic Ancestry and Admixture 
of Argentineans 

Autosomal DNA 

Individual clustering analyses were performed using genetic 
data from 24 autosomal ancestry-sensitive SNPs in 246 Ar- 
gentineans from eight provinces and three regions of the 
country (Fig. 1). We additionally typed these SNPs in 158 
Europeans (all European HGDP-CEPH samples), 44 Na- 
tive South Americans (Brazilian Karitiana and Surui HGDP- 
CEPH samples) and 49 West Sub-Saharan Africans (Man- 
denka and Yoruba HGDP-CEPH samples). The latter three 
groups were included as parental populations in the STRUC- 
TURE analysis together with the Argentinean data to as- 
sess the degree of continental geographic ancestry in the 
Argentinean samples. These parental reference samples have 
been specifically ascertained because the respective groups 
are geographically most closely related to the expected true 
parental populations for Argentineans (Rock, 1987) from the 
global reference data of the HGDP-CEPH samples available 
to us. 

A MDS analysis performed with a matrix of identity-by- 
state distances between pairs of individuals and two dimen- 
sions revealed that the three parental populations showed sim- 
ilar genetic distances between them, indicating similar power 
to detect genetic ancestry of all three parental groups in the 
Argentineans (Fig. 2). As evident from the plot, most of the 
Argentinean samples clustered with or closest to Europeans, 
some appeared between Europeans and Native Americans 
indicating some degree of genetic admixture between these 
two groups, three samples clustered close to Native Ameri- 
cans, and no Argentinean sampled appeared close to Africans 
(Fig. 2). In a STRUCTURE analysis using the same three 
parental populations we observed a similar pattern, European 
ancestry for most Argentinean samples, but also a considerable 
fraction of Native American ancestry in a number of Argen- 
tinean samples (Fig. 3). Overall across Argentinean samples, 
the mean ancestry components as revealed from the STRUC- 
TURE analysis were 78.6% (95% confidence interval (Cl) 
ranging from 31.5% to 96.6%) for European, 17.3% (95% Cl 
from 1.5% to 63.8%) for Native American, and 4.2% (95% 
Cl ranging from 1.1% to 19.0%) for West-African ancestry 
(Table 1). 



0.6 
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• Argentina 
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Figure 2 MDS analysis plot of the Argentinean samples 
together with three assumed parental populations: Europeans, 
Native South Americans and West Sub-Saharan Africans. 




Furthermore, a large admixture heterogeneity between in- 
dividuals was observed especially involving Native Ameri- 
can and European components, ranging from ~0% of Native 
American component and ~90% of European ancestry to 
~80% of Native American ancestry and ~5% of European 
component (see Supporting Fig. SI). Both ancestry compo- 
nents are strongly negatively correlated in the Argentinean 
samples (slope = —1.154, p value slope = < 2e— 16, Pearson 
correlation r-squared = 0.887, p value < 2.2e— 16 of a linear 
regression with the logit values of Native American and Eu- 
ropean ancestry). In contrast, African ancestry is poorly cor- 
related either with the amount of European ancestry (Pearson 
r-squared: 0.043, p value = 0.00061) or Native American 
ancestry component (Pearson correlation r-squared: 0.0035, 
p value = 0.354) in the Argentinean samples. AMOVA re- 
vealed that grouping of Argentineans according to the eight 
provinces they were sampled from explained a very small 
proportion of 0.92% (p = 0.00489) of the total autosomal 
genetic variation. No population substructure was detected 
when clustering individuals according to the three geographic 
regions to which these provinces belong (0.5% of variation 
between groups, p value = 0.264). 

Y-chromosomal and mitochondrial DNA 

Investigation of NRY-SNPs ascertained to maximise the 
detection of continental geographic ancestry revealed 19 
NRY haplogroups in the Argentinean samples (Supporting 
Table SI with their most likely continent of origin indicated). 
Across Argentinean samples the overall ancestry compo- 
nents as revealed from NRY data were 94.1% for European, 
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Figure 3 Ancestry estimation, based on STRUCTURE 2.0, of the Argentinean samples 
assuming three parental populations: Native South Americans (AME), Europeans (EUR) and 
West Sub-Saharan Africans (AFR) compared with Argentineans from eight provinces (see 
Material and Methods for acronym references), a) District-based bar plot where each bar 
represents an individual and the three first blocks the putative parental populations, b) ternary 
plot in which each dot represents an Argentinean individual and in each vertex a putative 
parental population. 



Table 1 Mean amount of continental autosomal ancestry (standard deviations) of Argentineans from each of the provinces and geographic 
regions based on STRUCTURE analysis 



Region 


Province* / region 


N 


Native S-American 


European 


W Ss-African 


North-Eastern 


ACA 


1 


67.3 


31.4 


1.3 




ACO 


21 


17.61 (12.25) 


77.45 (11.94) 


4.96 (5.14) 




AFO 


11 


21.93 (19.72) 


74.74 (19.56) 


3.34 (3.29) 




AMI 


28 


13.35 (17.02) 


82.40 (18.02) 


4.25 (4.39) 




Sub To taRorthern 


61 


17.25 (17.27) 


78.48 (17.45) 


4.28 (4.44) 


Central 


ABS 


150 


15.03 (16.18) 


80.80 (16.39) 


4.18 (4.06) 




ASF 


3 


18.87 (21.47) 


77.47 (22.70) 


3.63 (1.27) 




SubTotalcentral 


153 


15.10 (16.21) 


80.73 (16.45) 


4.17 (4.03) 


Southern 


ARN 


31 


27.64 (23.04) 


68.51 (24.59) 


3.84 (4.47) 




ACH 


1 


30.7 


66.4 


2.9 




Sub To talsoufhern 


32 


27.74 (22.67) 


68.45 (24.19) 


3.82 (4.4) 


Total 




246 


17.28 (17.84) 


78.57 (18.24) 


4.15 (4.17) 



*ABS = Buenos Aires, ACA = Chaco, ACO = Corrientes, AFO = Formosa, AMI = Misiones, ARN = Rio Negro, ASF = Santa Fe 
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Table 2 Percentage of Y-chromosomal and mtDNA continental 
ancestry of Argentineans from three geographic regions 





Native American 


European 


African 


mtDNA* 


North-Eastern (61) 


67.2 


31.1 


1.6 


Central (153) 


45.7 


52.9 


1.3 


Southern (32) 


65.6 


28.1 


3.1 


Total (246) 


53.7 


44.3 


2.0 


NRY-DNA* 


North-Eastern (61) 


2.5 


95.8 


1.7 


Central (153) 


4.8 


94.7 


0.5 


Southern (32) 


10.9 


87.5 


1.6 


Total (246) 


4.9 


94.1 


0.9 



*for NRY data the expected ancestry proportions given the fre- 
quency of each haplogroup in the three continents is shown since 
for some NRY haplogroups no single continent of origin could be 
assigned (see Supporting Table SI for details), whereas for mtDNA 
haplogroups single continents of ancestry could be unequivocally 
assigned and were used here. 




NRY ON A mtDNA Autosomal 



■ Africa 

■ Europe 
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Figure 4 Proportions of Native American, European and West 
Sub-Saharan African ancestry of the Argentinean samples as 
inferred from NRY and mtDNA haplogroups and autosomal 
ASDM typing results analyzed by STRUCTURE. 



4.9 % for Native American, and 0.9% for African ancestry 
(Table 2, Fig. 4). An AM OVA using NRY-SNP haplogroup 
data revealed no evidence of population substructure when 
considering the eight provinces from which the Argen- 
tineans were sampled (—1.57% of variation between groups, 
p value = 0.92), and also not when clustering the provinces 
according to the three geographic regions to which they be- 
long (2.2% of variation between groups, p value = 0.33). 

From the mtDNA data we identified 59 different mtDNA 
haplogroups among the 246 Argentineans (Supporting Ta- 
ble S2 with their most likely continent of origin indicated). 
Resulting overall mtDNA-based continental ancestry com- 
ponents were estimated at 44.3% European, 53.7% Native 



S-American, and 2.0% African (Table 2, Fig. 4). An AM OVA 
using mtDNA haplogroups showed that grouping of Argen- 
tineans according to their eight sampling provinces explained 
a small amount, 1.77% (p = 0.00489), of the total mtDNA 
variation, but no substructure was observed when clustering 
the provinces according to the three regions to which they be- 
long (—0.17% of variation between groups, p value = 0.37). A 
Kruskal- Wallis test performed with the autosomal amount of 
Amerindian ancestry and clustering the individuals according 
to the geographic origin of their mtDNA was strongly statis- 
tically significant (y 2 = 71.64, p value 2.77e— 016). A similar 
result was observed when the geographic origin of the NRY 
and mtDNA were taken into account together (/ 2 = 82.26, 
p value 2.8e-016). 

The proportions of continental ancestry estimated from 
NRY and mtDNA data in the Argentinean samples were sta- 
tistically significantly different from each other (Fisher exact 
test = 158.78, two tail p-value = 4.89e-036) even when 
considering the ancestry of both loci at the individual level 
(Fisher exact test = 22.07; two tail p value = 0.016). In order 
to test whether the observed proportion of ancestry in the 
autosomal markers could produce the observed ancestry esti- 
mations in mtDNA and NRY, we assumed that the observed 
ancestry proportions of mtDNA and NRY were the out- 
come of a multinomial distribution with success probability 
of each ancestry class given by that estimated in the autosomal 
markers. A statistically significant p value was observed when 
comparing the proportions of autosomal ancestry with the 
proportions observed in mtDNA (p = 4.65e— 39), as well as 
when comparing with NRY (p = 7.96e— 11). This suggests 
that it is quite unlikely that the same amounts of admixture 
observed in the autosomal markers could by sampling chance 
produce the observed amounts of admixture that has been 
detected in the two sex-linked loci. 



Distribution of Native American Genetic 
Ancestry Among Argentinean Paternal 
Surnames 

An AMOVA using the autosomal genetic data revealed that 
a small amount, 1.27% (p < 0.0005), of the total autoso- 
mal genetic variation is explainable by grouping the Argen- 
tinean samples according to the inferred geographic origin 
of the sample donors surnames. We further tested whether 
the density distribution of the proportion of the Native 
American autosomal ancestry was different depending on 
the geographic origin of the surname of each individual (see 
Supporting Fig. S2). Notably, there were individuals with 
90% European genetic ancestry carrying surnames of Native 
American origin as well as other individuals with a Spanish 
surname but 80% of Native American genetic ancestry. A 
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Kruskal- Wallis test performed with the amount of Native 
American genetic ancestry and the geographic origin of the 
European surnames was statistically significant (Kruskal- Wallis 
test p = 5.1e-005). Statistically significant differences (Fisher 
exact test p value = 0.002) were also observed in the conti- 
nental NRY ancestry depending on the continental ancestry 
of the surname. Whereas 96% of the individuals with Eu- 
ropean surnames carried European Y-chromosomes, 50% of 
the samples from individuals with Amerindian surnames had 
European Y chromosomes. 



Discussion 

Although often considered the European people of South- 
America, historical records demonstrate that contemporary 
Argentineans are the result of genetic admixture processes 
involving three continental contributors: Native Americans, 
Western Africans and Europeans. However, robust genetic ad- 
mixture estimates using markers from uni- and bi-parentally 
inherited parts of the genome, as well as transmitted cultural 
markers such as the surname, in the same set of samples had 
not yet been conducted. Hence, we employed a wide range of 
DNA markers from the autosomes, the Y-chromosome and 
from mtDNA that are ancestry-sensitive for the three con- 
tinental groups putatively involved in the Argentinean pop- 
ulation history in those individuals collected throughout the 
country. Examples from other parts of the world, such as the 
Pacific (Kayser et al., 2006; Kayser et al., 2008), have shown 
that analysing both uni-parental as well as the bi-parental 
parts of the genome is essential as the different parts of the 
genome can reveal different geographic ancestry components, 
providing important insights into different aspects of human 
population history. Evolutionary studies, association mapping, 
disease-risk prediction and forensic analysis are some of the 
research fields in which genetic admixture estimates are rel- 
evant (Sans, 2002; Liu et al., 2005; Yang et al., 2005; Wang 
et al., 2008). 

Our investigation focusing on the Argentinean population 
with a tri-parental model revealed a structure with varying 
genetic proportions depending on the genetic system anal- 
ysed. Clustering analyses performed on data from autoso- 
mal ancestry-sensitive SNPs — providing asomewhat approxi- 
mate representation of the bi-parentally inherited part of the 
genome — demonstrated a major European component (over- 
all 78.5%) in the pooled Argentinean sample, whereas the Na- 
tive American component (overall 17.3%) was lower but con- 
siderable, and the African component was very small (overall 
4.1%). Very similar values were obtained previously by Seldin 
et al. (2007) in a smaller sample of smaller geographic distri- 
bution than was analysed by us. Notably, their results were 
achieved with 54 more autosomal ancestry-sensitive SNPs 



than were applied in the present study, illustrating that our 
24 ASM-SNPs may contain more continental ancestry in- 
formation. Slightly lower proportions of European ancestry, 
together with slightly higher proportions of Native Ameri- 
can and African ancestry were observed for individuals with 
European surnames from La Plata City, but only five ancestry- 
sensitive markers were used (Martinez-Marignac et al., 2004). 
However, despite this general trend in the genetic variation, 
it should be noted that our analyses revealed a considerable 
genetic heterogeneity at the individual level, also noticed 
before in a smaller number of Argentinean samples (Seldin 
et al., 2007). About 40% of the sampled individuals attained 
more than 90% of European ancestry, but the Native Amer- 
ican proportion was as high as 80% albeit this was observed 
only in a single individual. Such findings reflect the dynamics 
of the recent demographic history of the extant Argentinean 
population, indicating that individuals retaining a higher pro- 
portion of European ancestry could be descendants from the 
recent newcomers to the Argentinean population, whereas 
those with a higher amount of Native American admixture 
could be descendants from the first contact between the Euro- 
pean and Amerindian populations, starting approximately 500 
years ago. Much lower European, together with much higher 
Native American, ancestry proportions than were observed 
on average here were detected in three Argentinean groups 
by Wang et al. (2008) using 751 autosomal short-tandem re- 
peat (STR) polymorphisms, and the discrepancies may in part 
be related to differences between the samples used. 

When considering lineage-specific genetic markers, a dif- 
ferent picture was observed, depending on whether these were 
maternally or paternally inherited. Based on NRY data the 
overall European component was very high (overall 94.1%); 
in particular 1.2 times higher than has been established from 
autosomal markers, whereas the Native American compo- 
nent (overall 4.9%) was very low, in particular 3.5 times lower 
than from autosomal DNA. The African proportion (over- 
all 0.9%) was about 4.7 times lower than was detected with 
autosomal DNA. So far, studies focused on the amount of 
male lineage ancestry in the Argentinean population have 
only analysed the presence of Native American motifs, char- 
acterised by the mutation M3, depicting a C to T transi- 
tion at locus DYS199 (Underhill et al., 1996). Our results 
corroborate the limited proportion of Native American an- 
cestry observed by Salas et al. (2008) using NRY-STRs in 
samples from Cordoba (Fisher exact test p value = 0.178). 
However, our results showed statistically significant differ- 
ences (Fisher exact test p value 1.819e— 040) with the Native 
American proportions observed in individuals with Euro- 
pean surnames from the city of La Plata (Martinez-Marignac 
et al., 2004) and with those observed by Corach et al. (2006) 
using samples from three different geographic regions (Fisher 
exact test p value 1.302e-006). These discrepancies might be 
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explainable by the different sampling ascertainment schema 
of the samples used. In particular, samples used in the present 
study came from private paternity testing afforded by the 
participants themselves, whereas those from the Corach et al. 
(2006) study came from forensic case work. A socio-economic 
bias associated with bio-geographic ancestry could explain the 
differences of the ancestry estimations in the two studies. The 
previous and current results together suggest that individuals 
with a Native American Y chromosome may be more as- 
sociated to forensic case work, which could imply that the 
combination of socio-economic factors and bio-geographical 
ancestry may be acting as a confounding factor. 

In contrast to autosomal and NRY data, the analysis of 
mtDNA haplogroups identified Native American ancestry as 
a major component (overall 53.7%), which was 3.1 times 
higher than detected with autosomal markers and 1 1 times 
higher than with NRY markers. Consequently, the Euro- 
pean contribution detected with mtDNA (overall 44.3%) was 
1.8 times lower than with autosomal markers and 2.1 times 
lower than with NRY markers. The African mtDNA propor- 
tion (overall 2.0%) was about half of what we detected based 
on autosomal data but about twice that estimated for NRY 
DNA. Previous mtDNA-based estimates of Native Ameri- 
can ancestry in Argentineans are somewhat similar (Martinez 
Marignac et al., 2004; Fisher exact test p value when com- 
pared with the current study = 0.389; Corach et al., 2006; 
Fisher exact test p value 0.355). The latter would imply that 
the geographic ancestry of mtDNA is independent from the 
assumed socio-economic differences in forensic versus pater- 
nity testing sampling in contrast to the NRY findings. 

The presence of large differences in the continental ancestry 
proportions detected with uni-parentally inherited markers, 
as we also found here in Argentineans, seems to be a common 
observation within the Latin American countries (Batista dos 
Santos et al., 1999; Martinez Marignac et al., 2004; Bertoni 
et al., 2005, Campos-Sanchez et al., 2006). Typically, the 
proportion of Native American ancestry is low for male lin- 
eages but high for female ones, whereas European ancestry is 
high for male lineages but low for female ones, as a conse- 
quence of sex-mediated ancestry differences in the admixture 
history of the population. In our case, soon after the first Eu- 
ropean contact with the current territory of Argentina, the 
original population was dramatically affected by a severe re- 
duction of the Native American male population, as a result 
of the conquest. Consequently, an increased proportion of 
offspring from European males and Native American women 
were born, due in part to low European female population 
size and the reproductive preponderance of the European in- 
vaders. This situation might reflect the political decisions by 
the Spanish crown for implementing a strategy for population 
growth and colonial occupation of the invaded territories. 
Additional social factors, also in subsequent periods, limited 



Native Amerindian male genetic flow into the admixed pop- 
ulation. 

It is interesting to note that although the number of West- 
Africans who were introduced to the territory of contempo- 
rary Argentina by European slave traders between 1580 and 
1813 was large (i.e. 100,000 for the La Plata and Boliva re- 
gion Rawley & Behrendt, 2005), and slavery was abolished in 
1853, the African component we detected in our Argentinean 
samples was very low with any of the three genetic systems 
applied. This result indicates a low degree of African admix- 
ture in the general Argentinean population, which is differ- 
ent to North American countries such as the United States 
where African admixture components in European Ameri- 
cans are up to about one third (Kittles et al., 2003). Elence, a 
stronger social barrier may have existed in Argentina, resulting 
m a lower number of offspring from parents of West- African 
and European (or mixed European-Native American) ances- 
try than that which occurred in North-America. Moreover, 
these results could also suggest the presence of demographic 
pressures against individuals carrying a large West African an- 
cestry in Argentina. In particular, the “freedom of wombs’’ 
law (children of slaves were born free) established by the Con- 
stituent Assembly in 1813, might have stimulated slave mas- 
ters to sell their female slaves outside the country, reducing 
the matrilineages of African ancestry. It should be noted that 
slave trading was prohibited since 1813 but slavery was abol- 
ished m 1853. In addition, Argentineans of African descen- 
ded were recruited as soldiers during the independence war 
between 1810 and 1818, and in the war of the Triple Alliance 
(Argentina, Uruguay and Brazil against Paraguay) between 
1864 and 1870, with high mortality rates. These wars reduced 
the number of African males, hence reducing African patri- 
lineages. Finally, the impact of cholera (1861 and 1864) and 
yellow fever (1871) epidemics especially affected the poor- 
est parts of Argentinean society, which included most of the 
people of African descent. 

In addition to using genetic diversity for quantifying bio- 
geographic ancestry and admixture, we have also analysed 
the surnames of the sample donors as a paternally inherited 
social marker of ancestry. Our results demonstrate a trend be- 
tween the geographic origin of the surnames and the amount 
of autosomal Native Amerindian ancestry, particularly with 
Spanish surnames, which may be explainable in the context 
of European arrival times. Male Spanish conquers were the 
first to meet with the aboriginal Amerindian population and 
therefore had most time to admix with them, whereas addi- 
tional non-Spanish European sub-populations arrived much 
later in Argentina, with less time for establishing admix- 
ture. Tlowever, we also detected some outliers with higher 
proportions of European ancestry and Amerindian surname 
and vice versa, which may be explained by random ge- 
netic drift or illegitimate paternities. Kidnapping of people of 
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European descent by Native Americans, as happened espe- 
cially during the first half of 19 th century, may be another 
explanation for this phenomenon. These captives were forced 
to live in Native American communities and not only adopted 
their lifestyle but also received Amerindian names; however, 
the frequency of such events was too low (Martinez Sara- 
sola, 2005) to explain our observation. On the contrary, Eu- 
ropean surnames associated with large Amerindian genetic 
components may potentially reflect a different scenario: most 
of the Native Americans lacked a composite name and may 
have been given Spanish surnames from e.g. encomenderos, 
administrative officers and clergymen. In addition some peo- 
ple of Native American descent might have decided to change 
their surnames into European ones in order to avoid social dis- 
crimination. An alternative explanation would be adoptions 
of Native Americans by people of European ancestry. 

In conclusion, we found that the contemporary 
Argentinean population carries a major European ancestry 
component in their autosomal DNA, and even more so in 
their Y-chromosomal gene pool, in agreement with prior ex- 
pectancies. In contrast, most of the Argentinean mitochon- 
drial gene pool was of Native American ancestry. The African 
genetic ancestry of Argentineans was very low based on all 
three genetic systems, which is remarkable given the large 
number of West African slaves brought to the region. Differ- 
ences between the amounts of European and Native Amer- 
ican ancestry components detected using paternal NRY and 
maternal mtDNA markers are in line with the sex-biased ad- 
mixture history involving predominantly European men and 
Native American women at least in the early periods of Euro- 
pean contact. In the Argentinean population only very small 
amounts of population substructure in respect of the eight 
sampled provinces were observed based on autosomal and 
mtDNA data (none based on NRY data), and no substructure 
was detected with all three genetic systems when clustering 
the provinces into the three geographic regions to which they 
belong. The complex genetic ancestry picture revealed in our 
study underscores the need for the combined use of ancestry- 
sensitive markers of both uni-parental as well as of bi-parental 
transmission in order to obtain more accurate inferences of 
bio-geographic ancestry which is relevant in epidemiological, 
historical and forensic studies. This is also important in other 
South-American countries that underwent similar events of 
sex-biased admixture; but, moreover, is also relevant in all 
other countries where people of different continental origin 
live in close geographic proximity (such as the United States 
of America) allowing continental admixture to have occurred. 
Our data also show that using surnames as a proxy for ancestry 
inferences (or “ethnic affiliation”) might be misleading; how- 
ever their use may supplement genetic information arising 
from ancestry sensitive markers and may provide interesting 
insights into the social behaviour of a population. 
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